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Research  Objectives 

The  goal  of  this  project  is  to  study  the  problem  of 
developing  correct  and  efficient  distributed  software,  i.e. 
software  which  consists  of  cooperating  processes.  The  specific 
focus  is  on  the  distributed  nature  of  the  distributed  software. 

Our  objective  is  to  extend  techniques  developed  for  sequential 
programs  to  distributed  programs. 

The  development  of  a  sequential  program  consists  of  posing 
assertions,  constructing  a  program  to  ensure  that  these  asser¬ 
tions  are  maintained  and  then  proving  that  they  are.  Termina¬ 
tion  is  shown  by  demonstrating  that  execution  of  the  program 
reduces  some  metric  which  is  bounded  from  below.  The  major 
problems  in  extending  this  methodology  to  distributed  systems 
are:  (1)  no  one  process  can  assert,  unilaterally,  that  an  in¬ 
variant  holds,  because  some  other  process  may  cause  the  invariant 
to  be  violated  and  (2)  a  distributed  system  may  terminate  in 
the  form  of  a  deadlock  for  instance,  even  though  no  process  in 
the  system  has  terminated;  furthermore,  no  process  can  assert 
unilaterally  that  the  entire  system  has  terminated.  The  funda¬ 
mental  problem  with  distributed  systems  is  to  ensure  cooperation 
among  processes  in  maintaining  invariants  and  in  achieving  proper 
termination.  Therefore,  the  focus  of  this  project  was,  and  con¬ 
tinues  to  be,  issues  of  cooperation.  . 
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The  project  has  been  extremely  successful  in  the  last  year 
resulting  in  the  identification  of  fundamental  problems  and  the 
publication  of  solutions  to  some  of  these  problems.  Work  was 
carried  out  in  3  areas: 

1.  Distributed  algorithms  for  detecting  termination  of 
distributed  computations. 

2.  Methods  for  proving  correctness  of  distributed 
software. 

3.  The  development  of  distributed  algorithms  to  solve 
problems  in  various  application  areas. 


Termination  Detection 


A  distributed  computation  may  terminate  due  to  a  deadlock 
or  because  the  computation  has  been  successfully  completed. 

The  major  impetus  for  developing  distributed  termination  detec¬ 
tion  algorithms  has  come  from  distributed  data  bases  where  the 


concern  is  to  detect  deadlock  (because  data  bases  may  be  pre¬ 
sumed  to  run  indefinitely) .  Therefore  our  primary  goal  last 
year,  was  to  develop  correct,  practical  and  simple  distributed 
algorithms  to  detect  deadlock.  Motivation  for  attacking  the 
problem  was  also  derived  from  the  following  statement  in  a 
recent  paper  by  Gligor  and  Shattuck:  "Renewed  interest  in  distri¬ 
buted  systems  has  resulted  in  the  publication  of  at  least  ten 
protocols  for  deadlock  detection.  However,  few  of  these  proto¬ 
cols  are  correct  and  fewer  appear  to  be  practical." 

In  a  system  consisting  of  processes  which  only  communicate 


with  a  single  central  agent,  deadlockcan  be  detected  easily 
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because  the  central  agent  has  complete  information  about  every 
process.  Deadlock  detection  is  more  difficult  when  there  is 
no  such  central  agent  and  processes  may  communicate  directly 
with  one  another.  If  we  could  assume  that  message  communication 
is  instantaneous  or  place  other  restrictions  on  message  delays, 
deadlock  detection  becomes  simpler.  However,  the  only  realistic 
general  assumption  is  that  message  delays  are  arbitrary  (but 
finite) .  We  present  deadlock  detection  algorithms  for  networks 
of  processes  in  which  there  is  no  single  central  agent  and  in 
which  message  delays  are  arbitrary  (but  finite) .  We  only  assume 
that  messages  sent  by  a  process  A  to  a  process  B  are  received  by 
B  in  the  order  sent  by  A. 

We  consider  two  models  of  deadlock  in  message  communicating 
systems:  resource  and  communication  deadlocks.  Deadlock  detec¬ 
tion  algorithms  are  given  for  both  models.  Most  models  of  dead¬ 
lock  in  distributed  data  bases  are  resource  deadlock  models 
[5,6,7,8,9,10,11,14];  in  these  models  deadlock  arises  because 
processes  may  wait  permanently  for  one  another  for  resources  held 
by  each  other.  The  communication  deadlock  model  is  a  more  abstract 
and  more  general  model  of  deadlock;  it  is  applicable  to  any  mes¬ 
sage  communicating  system  of  processes. 

We  have  presented  and  proved  the  correctness  of  simple, 
practical  algorithms  to  detect  resource  and  communication 
deadlocks  [1,2,12]. 
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Methods  for  Proving  Correctness  of  Distributed  Software 

Our  primary  goal  in  this  area  has  been  to  extend  well- 
known  sequential  programming  proof  constructs  such  as  pre¬ 
condition,  post-condition  and  the  use  of  metrics  in  proving  ter¬ 
mination  to  distributed  programs.  The  obvious  advantage  in 
using  techniques  which  are  extensions  of  sequential-programming 
techniques  is  that  the  tools  and  the  experience  gained  from 
sequential  programming  can  be  applied  to  distributed  programs  as 
well.  We  work  with  a  general  model  of  distributed  systems;  wo 
do  not  require  that  distributed  programs  be  coded  in  any  parti¬ 
cular  language  for  purposes  of  proof. 

The  key  features  of  our  method  are; 

1.  Modular  Specification ;  We  present  a  scheme  for  speci- 
fying  processes  in  a  modular  fashion.  The  specification 
relies  exclusively  on  a  process's  interaction  with  its 
environment  and  is  independent  of  process  implementation. 

2.  Hierarchy :  We  present  inference  rules  by  which  a 
specification  for  a  network  is  derived  from  specifica¬ 
tions  of  component  processes.  Thus  the  proof  of  a 
network  is  not  concerned  with  implementations  of 
component  processes. 

3.  Compatibility  With  Sequential  Programming  Proof  Techniques 
We  have  extended  well  known  sequential  programming  proof 
constructs  such  as  pre-condition , post-condition  and  the 
ideas  of  termination  proof  to  distributed  systems.  Those 
familiar  with  the  Floyd-Hoare  proof  technique  for  sequen¬ 
tial  programming  should  find  our  method  to  be  straight¬ 
forward. 

We  use  some  ideas  from  sequential  program  proofs  in  proofs 
of  message-passing  systems.  In  an  annotated  proof  of  a  sequen¬ 
tial  program,  each  statement  s  has  a  precondition  pre(s)  and  a 
postcondition  post(s).  The  proof  shows  that  if  assertion  pre(s) 
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holds  prior  to  execution  of  s,  post(s)  holds  following  execution 
of  s  assuming  execution  of  s  terminates.  Vie  shall  use  the 
precondition/postcondition  concept  for  describing  process  safety 
properties.  Proofs  of  liveness  (or  termination)  in  sequential 
programs  are  based  on  demonstrating  the  existence  of  a  metric 
such  that  the  execution  of  each  statement  causes  the  metric  to 
decrease  in  value.  We  will  use  a  similar  technique  in  process 
proofs.  However,  processes  can  wait  indefinitely  for  messages, 
something  that  conventional  sequential  programs  do  not  do?  to 
handle  this  we  introduce  a  new  concept  called  activity  which 
is  the  condition  under  which  a  process  will  definitely  send  or 
receive  a  message.  Other  liveness  properties  are  derived  frrr 
the  basic  property  of  activity  and  from  safety. 

We  have  developed  a  coherent  extension  of  sequential  pro¬ 
gramming  proof  techniques  to  distributed  programs.  Several 
examples  are  found  in  Ossefort  [15]. 

The  Development  Of  Distributed  Algorithms  To  Solve  Problems  In 
Various  Application  Areas 

We  have  attempted  to  develop  distributed  algorithms  in  two 
application  areas:  simulation  and  graph  problems.  The  applica¬ 
tion  areas  were  chosen  because  of  their  importance  and  the 
familiarity  of  the  principal  investigators  with  these  areas. 

Our  pioneering  effort  in  the  distributed  simulation  area  has 
received  wide  recognition;  therefore  our  effort  last  year  was 
primarily  in  graphs.  By  developing  distributed  simulations  to 
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important  problems  we  hoped  to  gain  experience  in  writing  and 
proving  distributed  programs,  as  well  as  making  a  contribution 
to  the  literature  on  algorithms. 

We  began  by  developing  a  distributed  solution  for  one  of 
the  most-studied  problems  in  graphs:  finding  the  shortest  path 
between  vertices.  A  distributed  solution  is  important  in  the 
following  situation:  communication  paths  are  being  set  up  between 
processes  in  an  unreliable,  $nd  perhaps  even  hostile  environ¬ 
ment.  Since  no  process  has  information  about  all  other  processes 
in  the  network,  centralized,  sequential-programming  algorithms 
cannot  be  used. 


We  developed  a  distributed  algorithm  to  detect  shortest 
paths  in  graphs  which  have  negative  cycles.  We  also  demonstrated 
the  application  of  our  shortest-path  algorithm  in  solving  a 
variety  of  graph  problems  including  depth-first  search. 

Another  important  problem  in  graphs  is  that  of  detecting 
knots:  a  vertex  in  a  directed  graph  is  in  a  knot  if  for  every 


vertex  v.  reachable  from  v. ,  v.  is  also  reachable  from  v..  The 
J  11  3 

problem  of  knot  detection  is  important  because  of  its  relevance 


to  deadlocks  [3,4].  We  developed  a  scheme  whereby  a  vertex  (which 


is  represented  by  a  process)  can  determine  if  it  belongs  to  a 


knot  [ 13] . 


Computing  Network-Wide  Functions 

We  found  that  there  was  a  sizable  class  of  problems  with 
the  following  structure: 
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Processes  in  a  network  cooperate  in  computing  a  result 
which  we  call  the  global-result  where 

global-result  =  f( local-result ( i ) ,  for  all  processes  i) 

where  local-result (i)  is  some  computed  result  in  process  i,  at 

its  termination,  and  /  is  any  computable  function.  The  knot 

detection  problem  is  only  one  of  many  problems  that  fall  within 

this  class.  We  developed  general  solutions  to  solve  this  class 

of  problems.  Our  solution  was  proved  correct  and  its  application 

to  specific  practical  problems  was  demonstrated. 

SUMMARY 

The  past  year  has  been  very  productive.  If  we  can  continue 
the  same  rate  of  productivity  in  the  future  we  shall  be  very 
pleased.  In  the  future  we  plan  to  enter  new  areas  and  to  ensure 
that  the  results  of  the  past  year  are  accepted  and  used  by  the 
computer  sciences  community. 
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